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Summary. In this article we study asymptotic properties of weighted samples produced by the 
auxiliary particle filter (APF) proposed by Pitt and Shephard (1999a). Besides establishing a 
central limit theorem (CLT) for smoothed particle estimates, we also derive bounds on the L p 
error and bias of the same for a finite particle sample size. By examining the recursive formula 
for the asymptotic variance of the CLT we identify first-stage importance weights for which the 
increase of asymptotic variance at a single iteration of the algorithm is minimal. In the light of 
these findings, we discuss and demonstrate on several examples how the APF algorithm can 
be improved. 

1. Introduction 

In this paper we consider a state space model where a sequence Y = {Yk}%L a is modeled 
as a noisy observation of a Markov chain X = {Xk}^L , called the state sequence, which is 
hidden. The observed values of Y are conditionally independent given the hidden states X 
and the corresponding conditional distribution of depends on Xk only. When operating 
on a model of this form the joint smoothing distribution, that is, the joint distribution of 
(Xo, ■ ■ ■ , X n ) given (Y , . . . , Y n ), and its marginals will be of interest. Of particular interest 
is the filter distribution, defined as the marginal of this law with respect to the component X n 
is referred to. Computing these posterior distributions will be the key issue when filtering 
the hidden states as well as performing inference on unknown model parameters. The 
posterior distribution can be recursively updated as new observations become available — 
making single-sweep processing of the data possible — by means of the so-called smoothing 
recursion. However, in general this recursion cannot be applied directly since it involves 
the evaluation of complicated high-dimensional integrals. In fact, closed form solutions 
are obtainable only for linear/Gaussian models (where the solutions are acquired using the 
disturbance smoother) and models where the state space of the latent Markov chain is finite. 

Sequential Monte Carlo (SMC) methods, often alternatively termed particle filters, pro- 
vide a helpful tool for computing approximate solutions to the smoothing recursion for 
general state space models, and the field has seen a drastic increase in interest over recent 
years. These methods are based on the principle of, recursively in time, approximating the 
smoothing distribution with the empirical measure associated with a weighted sample of 
particles. At present time there are various techniques for producing and updating such a 
particle sample (see Fearnhead, 1998; Doucet et al., 2001; Liu, 2001). For a comprehensive 
treatment of the theoretical aspects of SMC methods we refer to the work by Del Moral 
(2004). 

In this article we analyse the auxiliary particle filter (APF) proposed by Pitt and Shep- 
hard (1999a), which has proved to be one of the most useful and widely adopted implemen- 
tations of the SMC methodology. Unlike the traditional bootstrap particle filter (Gordon et 
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al., 1993), the APF enables the user to affect the particle sample allocation by designing 
freely a set of first-stage importance weights involved in the selection procedure. Preva- 
lently, this has been used for assigning large weight to particles whose offsprings are likely 
to land up in zones of the state space having high posterior probability. Despite its obvious 
appeal, it is however not clear how to optimally exploit this additional degree of freedom. 

In order to better understand this issue, we present an asymptotical analysis (being a 
continuation of (Olsson et al, 2006) and based on recent results by (Chopin, 2004; Kiinsch, 
2005; Douc and Moulines, 2005) on weighted systems of particles) of the algorithm. More 
specifically, we establish CLTs (Theorems 3.1 and 3.2), with explicit expressions of the 
asymptotic variances, for two different versions (differentiated by the absence/presence of a 
concluding resampling pass at the end of each loop) of the algorithm under general model 
specifications. The convergence bear upon an increasing number of particles, and a recent 
result in the same spirit has, independently of (Olsson et al, 2006), been stated in the 
manuscript (Doucet and Johansen, 2007). Using these results, we also — and this is the main 
contribution of the paper — identify first-stage importance weights which are asymptotically 
most efficient. This result provides important insights in optimal sample allocation for 
particle filters in general, and we also give an interpretation of the finding in terms of 
variance reduction for stratified sampling. 

In addition, we prove (utilising a decomposition of the Monte Carlo error proposed 
by Del Moral (2004) and refined by Olsson et al. (2005)) time uniform convergence in L p 
(Theorem 3.3) under more stringent assumptions of ergodicity of the conditional hidden 
chain. With support of this stability result and the asymptotic analysis we conclude that 
inserting a final selection step at the end of each loop is — at least as long as the number 
of particles used in the two stages agree — superfluous, since such an operation exclusively 
increases the asymptotic variance. 

Finally, in the implementation section (Section 5) several heuristics, derived from the ob- 
tained results, for designing efficient first-stage weights are discussed, and the improvement 
implied by approximating the asymptotically optimal first-stage weights is demonstrated 
on several examples. 



2. Notation and basic concepts 

2. 1. Model description 

We denote by (X, A"), Q, and v the state space, transition kernel, and initial distribution of 
X, respectively, and assume that all random variables are defined on a common probability 
space (f2, P, A). In addition we denote by (Y, y) the state space of Y and suppose that 
there exists a measure A and, for all x € X, a non-negative function y i— > g(y\x) such that, 
for k > 0, P(Yfe e A\X k = x) = j A g(y\x) \(dy), A E y. Introduce, for i < j, the vector 
notation Xi-j = (JQ, . . . , Xj); similar notation will be used for other quantities. The joint 
smoothing distribution of denoted by 

4> n (A) 4 p(X a:n e A\ Y 0:n = yo-.n) , A e X® (n+1) , 



and a straightforward application of Bayes's formula shows that 

(pk i(A) = lA9(yk+i\xk+i)Q(xk,dx k+1 )(f) k (dx a:k ) 
k+1 I x ^9{yk+i\x' k+1 )Q(x' k ,dx' k+1 )(f>k(dx' .k) ' 
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for sets A € < Y®( fc + 2 )_ We will throughout this paper assume that we are given a sequence 
{yk]k > 0} of fixed observations, and write, for x e X, g k (x) = g{y k \x). Moreover, from 
now on wc let the dependence on these observations of all other quantities be implicit, and 
denote, since the coming analysis is made exclusively conditionally on the given observed 
record, by P and E the conditional probability measure and expectation with respect to 
these observations. 



2.2. The auxiliary particle filter 

Let us recall the APF algorithm by Pitt and Shephard (1999a). Assume that we at time k 
have a particle sample {(£ 0: fc i^k ( eac h random variable £ .^ taking values in X fc+1 ) 

providing an approximation X)i=i ^k' 1 ^ N - i /^k °f the joint smoothing distribution (pk, 

where £l k = ^k'* an d u k' 1 > 0, 1 < i < A 7 ^. Then, when the observation y k+ i 

becomes available, an approximation of (f>k+i is obtained by plugging the empirical measure 
d>i: into the recursion (2.1), yielding, for A e X®( k+1 \ 

Here we have introduced, for x 0:k G X fe+1 and A e x®^ k+1 \ the unnormalised kernels 
Hl{x Q :k,A)= \ g k+1 (x' k+1 )S Xo . k (dx' . k )Q(x' k ,dx k+1 ) 

and H k (xo-.k, A) = H k (xo-. k , A)/ H k (x a - k , X fc+2 ). Simulating from H k (xo-. k , A) consists in 
extending the trajectory xo± e X fe+1 with an additional component being distributed ac- 
cording to the optimal kernel, that is, the distribution of X k+ \ conditional on X k = x k 
and the observation Y k+ \ = Vk+i- Now, since we want to form a new weighted sample 
approximating (f) k +i, we need to find a convenient mechanism for sampling from 4> k+1 

given {(^Q. k , tjJ k ,t )}f =1 . In most cases cases it is possible — but generally computation- 
ally expensive — to simulate from 4> k+1 directly using auxiliary accept-reject sampling (see 
Hiirzeler and Kunsch, 1998; Kunsch, 2005). A computationally cheaper (see Kiinsch, 2005, 
p. 1988, for a discussion of the acceptance probability associated with the auxiliary accept- 
reject sampling approach) solution consists in producing a weighted sample approximating 
4> k +i by sampling from the importance sampling distribution 

N N,i N.i 

i=l l^j=l U k T k 

Here r k ' 1 , 1 < i < N, are positive numbers referred to as first-stage weights (Pitt and 
Shephard, 1999a, use the term adjustment multiplier weights) and in this article we consider 
first-stage weights of type 

*f * = (2-1) 
for some function t k : X k+1 — > K + . Moreover, the pathwise proposal kernel i?^ is, for 
xo-.k G X fc+1 and A e A"®( fe + 2 ), of form 

Rl(x a:k ,A)= / S Xo . k (dx' . k )Rk(x'k,dx'k +1 ) 

J A 
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with Rk being such that Q(x, •) <C Rk(%, ■) f° r a ll x € X. Thus, a draw from i?£(xo : fc, ') is 
produced by extending the trajectory xq.± e X fc+1 with an additional component obtained 
by simulating from R/~(xk, •)■ It is easily checked that for x 0: k+i € X fe+2 , 

^-^(aiorfc+i) oc w k+1 (x a:k+1 ) = l^,i(x :fc) m TCWu- — \( X *+V ■ ( 2 - 2 ) 

An updated weighted particle sample ^fc+i)}i=i targeting 4> k+1 is hence generated 

by simulating Mn particles £^+1' 1 < i < Mn, from the proposal p k+1 and associating 

with these particles the second-stage weights — w fc+i (<^L' +1 ), 1 < i < Mjv. By the 

identity function in (2.2), only a single term of the sum will contribute to the second-stage 
weight of a particle. 

Finally, in an optional second-stage resampling pass a uniformly weighted particle sam- 
ple {(l :fe+i> stiU targeting 

0fe+i> is obtained by resampling TV of the particles £ 0: fc+ii 
1 < i < Mn, according to the normalised second-stage weights. Note that the number of 
particles in the last two samples, Mn and N, may be different. The procedure is now re- 
peated recursively (with = 1, 1 < i < N) and is initialised by drawing £q' 1 , 1 < i < N, 

independently from <r, where yielding u>q' 1 — wo((,q' 1 ) with wq(x) = go(x) dv/d<;(x), 

x e X. To summarise, we obtain, depending on whether second-stage resampling is per- 
formed or not, the procedures described in Algorithms 1 and 2. 

Algorithm 1 Two-Stage Sampling Particle Filter (TSSPF) 

Ensure: , uJ k ' l )}fLi approximates fa- 

ll for i — 1, . . . , Mn do > First stage 

I N ' 

k 

N,i N,e 



2: draw indices I^' 1 from the set {1, . . . ,N} multinomially with respect to the nor- 

l- J ■ u N.j N.j /sr^N Nl N.t 

mahsed weights u k J r k / 2^ =1 w fc T k > 1 ^ J ^ ^! 
3: simulate £o.£ +1 (k + 1 ) ~ R ^0:k k (k), ■], and 

4: SCt C 0: fc+1 = Ko:fc ^0:fc+l( fc + 1 )] andw fc+l = W fc+l(Co:fc+l)- 

5: end for 

6: for i = 1, . . . , N do > Second stage 

7: draw indices from the set {1 , . . . , Mn} multinomially with respect to the nor- 
malised weights Gj^+iI Y^i=i ^fc+L 1 — 3 — N, and 
8- set £ N>i = ^ N,J ^+ 1 

9: Finally, reset the weights: = 1. 

10: end for 



11: Take {(£ 0: fc+i> as an approximation of 0£+i. 



We will use the term APF as a family name for both these algorithms and refer to them 
separately as two-stage sampling particle filter (TSSPF) and single-stage auxiliary particle 
filter (SSAPF). Note that we by letting t^ 1 = f, 1 < i < N, in Algorithm 2 obtain the 
bootstrap particle filter suggested by Gordon et al. (1993). 

The resampling steps of the APF can of course be implemented using techniques (e.g., 
residual or systematic resampling) different from multinomial resampling, leading to straight- 
forward adaptions not discussed here. We believe however that the results of the coming 
analysis are generally applicable and extendable to a large class of selection schemes. 
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Algorithm 2 Single-Stage Auxiliary Particle Filter (SSAPF) 

Ensure: { (CoTfc 1 > w k'*)}iLi approximates <pk- 
1: for i = 1,. ..,N do 

2: draw indices from the set {1, . . . , N} multinomially with respect to the nor- 
malised weights Uk' jT k' j /^2eLi u 'k' e ' r k ,e i 1 - 3 — N ' 
3: simulate i^ +1 (k + 1) ~ Rk[£,Q.^ k (k), ■], and 

4: Set C :fc+1 = Ko:fc >5o:fc+l( fc + ^ and W fc+1 = W k+1 (C :fc+1 ) ■ 

5: end for 

6: Take 

:fe+i'^fe+i)}£i as an approximation of 4>k+i- 



The issue whether second-stage resampling should be performed or not has been treated 
by several authors, and the theoretical results on the particle approximation stability and 
asymptotic variance presented in the next section will indicate that the second-stage se- 
lection pass should, at least for the case Mjv = N, be canceled, since this exclusively 
increases the sampling variance. Thus, the idea that the second-stage resampling pass is 
necessary for preventing the particle approximation from degenerating does not apparently 
hold. Recently, a similar conclusion was reached in the manuscript (Doucet and Johansen, 
2007). 

The advantages of the APF not possessed by standard SMC methods is the possibility 
of, firstly, choosing the first-stage weights t^' 1 arbitrarily and, secondly, letting TV and M/v 
be different (TSSPF only). Appealing to common sense, SMC methods work efficiently 
when the particle weights are well-balanced, and Pitt and Shcphard (1999a) propose several 
strategies for achieving this by adapting the first-stage weights. In some cases it is possible 
to fully adapt the filter to the model (see Section 5), providing exactly equal importance 
weights; otherwise, Pitt and Shephard (1999a) suggest, in the case Rk = Q and X = M. d , the 
generic first-stage importance weight function t^ s (xo-.k) — 9k+i[J K d x ' Q( x k,dx')], x 0: k <G 
M fe+1 . The analysis that follows will however show that this way of adapting the first- 
stage weights is not necessarily good in terms of asymptotic (as N tends to infinity) sample 
variance; indeed, using first-stage weights given by can be even detrimental for some 
models. 

3. Bounds and asymptotics for produced approximations 

3.1. Asymptotic properties. 

Introduce, for any probability measure fj, on some measurable space (E, £) and /i-measurablc 
function / satisfying / E |/(x)| /x(dx) < oo, the notation (if = J E /(x) /x(dx). Moreover, 
for any two transition kernels K and T from (Ei,£i) to (£2^2) and (£2,^2) to (£3,^3), 
respectively, we define the product transition kernel KT(x,A) = J E ^ T(z 7 A) K(x, dz), for 
x e Ei and A <G £3. A set C of real- valued functions on X m is said to be proper if the 
following conditions hold: i) C is a linear space; ii) if g G C and / is measurable with 
\.f \ < |<?|i then |/| e C; iii) for all c € R, the constant function / = c belongs to C. 
From (Douc and Moulines, 2005) we adapt the following definitions. 

Definition 3.1 (Consistency). A weighted sample {(£0^, onX m+1 is said 
to be consistent for the probability measure \i and the (proper) set C C L 1 (X m+1 ,^i) if, for 
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any / <G C, as N — > oo, 



M N 



i=l 



(f®" 1 max wZ'-^O. 

1<i<Mn 

Definition 3.2 (Asymptotic normality). A weighted sample {(£oTm> w m'*)}w on 
X m+1 is ca^/ed asymptotically normal (abbreviated a.n.) for (//, A, W, a, 7, {ajv}jv=i) «/, as 
AT -> oo 7 

M N 

M^)" 1 2 ^[/(^ - /if] ^ AT[0, a^/)] for any / e A , 

i=l 

Mjv 

a^(^)- 1 EK'T/ffi) 7/ for any / G W , 
i=l 

a^(fi^)- 1 max . 

1<i<Mn 

The main contribution of this section is the following results, which establish consistency and 
asymptotic normality of weighted samples produced by the TSSPF and SSAPF algorithms. 
For all k > 0, we define a transformation on the set of </>/c-integrable functions by 

^k[f](x :k) = f(x :k) ~ faf , X 0:k E X k+1 . (3.1) 

In addition, we impose the following assumptions. 

(Al) For all k > 1, tk € L 2 (X fe+1 , fa) and Wk € L 1 (X fc+1 , fa), where tk and Wk are defined 
in (2.1) and (2.2), respectively. 

(A2) i) A C L 1 (X, fa) is a proper set and er : A — > K + is a function satisfying, for all 
f e A and a e M, ao(a/) = |a|cr (/). 

ii) TTie initial sample 1)}£Li * s consistent for [L 1 (X,fa),fa] and a.n. for 

[0o,A o ,W o , ( 7o,7o,{ViV}^ =1 ]. 

Theorem 3.1. Assume (Al) and (A2) wifft (Wo, 70) = [L X (X, </>o), </>o]- -^n the setting 
of Algorithm 1, suppose that the limit (3 = limjv^oo N/Mn exists, where (3 e [0,1]. Define 
recursively the family {&k}kLi by 

A fc+1 ± {/ G L 2 (X k+2 ,fa +1 ) : Rl(;w k+ i\f\)HU;\f\) & ^(X^ 1 ,^), 

m-Af\) € A k nL\x^\<i> k ),w k+1 f 2 eL 1 (x k+2 , fa+i)} • (3.2) 

Furthermore, define recursively the family {o-k}^ =1 of functionals Ok : A k — > K + o?y 

*k+l(/) = 0fe+l C Pfc+l[/J + ^ fcg u( X fc+2)]2 ' ( 3 - 3 ) 

XTien each A k is a proper set for all k > 1. Moreover, each sample l)}fLi V ro ~ 

duced by Algorithm 1 is consistent for [L 1 (X k+1 , fa) , fa] and asymptotically normal for 
[fa,Ak, L 1 (X fe +! ,fa),o-k,fa,{^N}N=i}- 
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The proof is found in Appendix A, and as a by-product a similar result for the SSAPF 
(Algorithm 2) is obtained. 

Theorem 3.2. Assume (Al) and (A2). Define the families {W fe }^ and {A fc }^ by 

W fe 4 {/ e L 1 ^ 1 ,^) : w k+1 f G L 1 ^ 1 ,^)} , W 4 W , 

and, wii/i A = A 0; 

A fe+ i ± {/ G L^X^ 2 ,^) : i?£(.,n; fe+1 |/|)if fe u (.,|/|) G ^(X^ 1 ,^), 

H%{; |/|) G A fc) [tf fe u (-, |/|)] 2 G W k ,w k+1 f G L 1 ^ 2 , fe+1 )} . (3.4) 

Furthermore, define recursively the family {<Tk}kLo of functionals a k ■ A k — > R+ oy 



<W/) [0 fc iZ fc u (X*+2)P ' ao " 1 - ( "- ;)) 

and i/ie measures {^k\kLi by 

TTien eac/i A^ is a proper set for all k > 1. Moreover, each sample ,& k ' l )}iLi pro- 

duced &w Algorithm 2 is consistent for [L 1 (X fc+1 , </> fe ), fc ] and asymptotically normal for 
[4> k ,A k ,W k ,Z k ,%,{y/N}% =1 \. 

Under the assumption of bounded likelihood and second-stage importance weight func- 
tions 3fc and one can show that the CLTs stated in Theorems 3.1 and 3.2 indeed include 
any functions having finite second moments with respect to the joint smoothing distribu- 
tions; that is, under these assumptions the supplementary constraints on the sets (3.2) and 
(3.4) are automatically fulfilled. This is the contents of the statement below. 

(A3) For all k>0, \\g k \\ Xt00 < oo and \\wk\\ X k+i i<x < oo. 

Corollary 3.1. Assume (A3) and let {A fc }^i and {A fe }£i be defined by (3.2) and 
(3.4), respectively, with A = A = L 2 (X,</> ). Then, for all k > 1, A k = L 2 (X k+1 ,(f> k ) and 
L 2 (X W ,«CA 4 . 

For a proof, see Section A. 2. 

Interestingly, the expressions of ct 2 +1 (/) and <r 2 +1 (/) differ, for [3 = 1, only on the 
additive term <p k +i$\ +1 [f], that is, the variance of / under <j>k+i- This quantity represents 
the cost of introducing the second-stage resampling pass, which was proposed as a mean 
for preventing the particle approximation from degenerating. In the coming Section 3.2 we 
will however show that the approximations produced by the SSAPF are already stable for 
a finite time horizon, and that additional resampling is superfluous. Thus, there are indeed 
reasons for strongly questioning whether second-stage resampling should be performed at 
all, at least when the same number of particles are used in the two stages. 
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3.2. Bounds on W error and bias 

In this part we examine, under suitable regularity conditions and for a finite particle popula- 
tion, the errors of the approximations obtained by the APF in terms L p bounds and bounds 
on the bias. We preface our main result with some definitions and assumptions. Denote by 
Sb(X m ) space of bounded measurable functions on X m furnished with the supremum norm 
||/||x m ,oo — sup xeXm |/(ar)|. Let, for / e B\,(X m ), the oscillation semi-norm (alternatively 
termed the global modus of continuity) be defined by osc(/) = sup^ , x >) e x m xx m \f( x )~ f( x ')\- 
Furthermore, the W norm of a stochastic vari able X is denoted by ||X|| p = E^IX^]. 
When considering sums, we will make use of the standard convention Y^k= a Cfe = if 6 < a. 

In the following we will assume that all measures Q(x, •), x € X, have densities q(x, ■) 
with respect to a common dominating measure /x on (X, X). Moreover, we suppose that 
the following holds. 

(A4) i) e_ = inf (XiX , )e x2 q(x, x') > 0, e+ = sup (x x , )eX 2 q(x, x') < oo. 
ii) For all ye Y, J x g(y\x) /x(dx) > 0. 

Under (A4) we define 



(A5) For all k>0, p fc || xfc+ i )00 < oo. 

Assumption (A4) is now standard and is often satisfied when the state space X is compact 
and implies that the hidden chain, when evolving conditionally on the observations, is 
geometrical ergodic with a mixing rate given by p < 1. For comprehensive treatments of 
such stability properties within the framework of state space models we refer to Del Moral 
(2004). Finally, let Ci(X n+1 ) be the set of bounded measurable functions / on X™ +1 of type 
f(xo-.n) — f{xi:n) for some function / : X™~ 4+1 — > M. In this setting we have the following 
result, which is proved in Section A. 3. 

Theorem 3.3. Assume (A3), (A4), (A5), and let f E C l {X n+1 ) for < i < n. Let 
{(£,o-k i ^k' 1 )}^!^ be a weighted particle sample produced by Algorithm r, r — {1,2}, with 



P= 1 • 



(3.6) 




i) For all p> 2, 
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ii) 



E 



3 = 1 



t>nfi 



< B 



osc( fj) 
{1-pf 



R N {r)e 



2 2 
1 A IKII X fc + 1 ,00 

\\tk-l\\ 

VV2 2^ {pg ^2 p 



k=l 



l{r = 1} / p 



N 



1-p 



\ W 0\\x,oo i 

XlP 



N(ug f 



Here p is defined in (3.6), and B p and B are universal constants such that B p depends on 
p only. 

Especially, applying, under the assumption that all fractions \\wk\\x k + 1 ,oo\\tk-i\\x'' OC /pgk 
are uniformly bounded in k, Theorem 3.3 for i = n, yields error bounds on the approximate 
filter distribution which are uniformly bounded in n. From this it is obvious that the 
first-stage resampling pass is enough to preserve the sample stability. Indeed, by avoiding 
second-stage selection according to Algorithm 2 we can, since the middle terms in the 
bounds above cancel in this case, obtain even tighter control of the L p error for a fixed 
number of particles. 



4. Identifying asymptotically optimal first-stage weights 

The formulas (3.3) and (3.5) for the asymptotic variances of the TSSPF and SSAPF may 
look complicated at a first sight, but by carefully examining the same we will obtain im- 
portant knowledge of how to choose the first-stage importance weight functions tk in order 
to robustify the APF . 

Assume that we have run the APF up to time k and are about to design suitable first- 
stage weights for the next iteration. In this setting, we call a first-stage weight function 
t' k [f], possibly depending on the target function / e Ak+i and satisfying (Al), optimal (at 
time k) if it provides a minimal increase of asymptotic variance at a single iteration of the 
APF algorithm, that is, if a 2 k+1 {t' k [f]}(f) < 4 +1 {t}(f) (or a 2 k+1 {t' k [f]}(f) < a 2 k+1 {t}(f)) 
for all other measurable and positive weight functions t. Here we let cr k+1 {t}{f) denote the 
asymptotic variance induced by t. Define, for x 0:k € X fe+1 , 



2 

<f> 2 k+1 [f}{x 0:k+1 ) R k (x k ,dx k+1 ) , (4.1) 

and let denote the second-stage importance weight function induced by t k [f] ac- 

cording to (2.2). We arc now ready to state the main result of this section. The proof is 
found in Section A. 4. 

Theorem 4.1. Let k>0 and define t* k by (4.1). Then the following is valid. 

i) Let the assumptions of Theorem 3.1 hold and suppose that f € {/' € A k +i '■ t k [f] € 
L 2 (X fc+1 ,0 fc ), w* k+1 [f] S L^X^ 2 ,^!)}. Thent* k is optimal for Algorithm 1 and the 



tt[f]M = J^9 2 k+1 (x k+ i) 



dQ{x k , ■) 
dR k (x k , •) 



(Zfc+l) 
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corresponding minimal variance is given by 



^+i{*fc}(/) = 0fc+i*fc+i[/] 



4m(-^k + i[f])]+mtiw 

[0 fc ^(Xfc+2)]2 



Let the assumptions of Theorem 3.2 hold and suppose that f e {/' e f\k+i : t* k [f] e 
L 2 (X k+1 ,4> k ),w* k+1 [f] e L 1 (X /s+2 ,</. /s+ i)}. TTieni* is optimal for Algorithm 2 and the 
corresponding minimal variance is given by 



[0 fc ^(X*+2)]2 



The functions i£ have a natural interpretation in terms of optimal sample allocation for 
stratified sampling. Consider the mixture tt = X^ i=1 Wi/ij, each /ij being a measure on some 
measurable space (E,£) and J^f=i w i — 1> an d the problem of estimating, for some given 
7r-integrable target function /, the expectation nf. In order to relate this to the particle 
filtering paradigm, we will make use of Algorithm 3. In other words, we perform Monte 

Algorithm 3 Stratified importance sampling 
1 

2 
3 
4 



for % = 1, . . . , N do 

draw an index Jj multinomially with respect to tj, 1 < j < d, Y^j=i T j = 1; 
simulate & ~ ^ , and 



compute the weights = — — 



t, diA 



5: end for 

6: Take {(£,i,^i)}iLi as an approximation of it. 



Carlo estimation of wf by means of sampling from some proposal mixture Y]j—i T j v j an d 
forming a self-normalised estimate — cf. the technique applied in Section 2.2 for sampling 
from 4> k+1 . In this setting, the following CLT can be established under weak assumptions: 



Vn 

with, for x G E, 



' N 

E 



i=l 1^1=1 



-m) - *f 



V 



Af 



3 = 1 



1 2 



(x) 



U 2 [f](x)^(dx) and U[f}(x)^f(x)-nf. 



Minimising the asymptotic variance ^ i= i[w 2 cti{f)/Ti\ with respect to Tj, 1 < i < d, e.g., 
by means of the Lagrange multiplicator method (the details are simple) , yields the optimal 
weights 



OC Wiy/aiif) = WiJ j 



dpi 
dvi 



1 2 



Or) 



lP[f\(x)vi(dx) , 



and the similarity between this expression and that of the optimal first-stage importance 
weight functions t* k is striking. This strongly supports the idea of interpreting optimal 
sample allocation for particle filters in terms of variance reduction for stratified sampling. 
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5. Implementations 

As shown in the previous section, the utilisation of the optimal weights (4.1) provides, 
for a given sequence {Rk}kLo of proposal kernels, the most efficient of all particle filters 
belonging to the large class covered by Algorithm 2 (including the standard bootstrap 
filter and any fully adapted particle filter). However, exact computation of the optimal 
weights is in general infeasible by two reasons: firstly, they depend (via <&fc + i[/]) on the 
expectation <j> k +if, that is, the quantity that we aim to estimate, and, secondly, they involve 
the evaluation of a complicated integral. A comprehensive treatment of the important issue 
of how to approximate the optimal weights is beyond the scope of this paper, but in the 
following three examples we discuss some possible heuristics for doing this. 



5.1. Nonlinear Gaussian model 

In order to form an initial idea of the performance of the optimal SSAPF in practice, we 
apply the method to a first order (possibly nonlinear) autoregressive model observed in 



noise: 



Xk+i = m(X k ) + G w (X k )W k +i , 
Yk = X k + a v Vk , 



(5.1) 



with {W k } k % 1 and {Vfc}^ being mutually independent sets of standard normal distributed 
variables such that W k +i is independent of (X i7 Yi), < i < k, and V k is independent of X k , 
(Xi,Yi), < i < k — 1. Here the functions o w : R — > R + and m : R — > R are measurable, 
and X = R. As observed by Pitt and Shcphard (1999a), it is, for all models of form (5.1), 
possible to propose new particle using the optimal kernel directly, yielding = H k and, 
for (x, x') e R 2 , 

1 f [x 1 - rh k (x)} 2 ' 



r k (x,x') 



exp 



a fc (x)V27T * I To\{x) 
with r k denoting the density of R k with respect to the Lebesque measure, and 



m k (x) = 



Vk+i m k (x) 

Ol ^(x) 



For the proposal (5.2) it is, for x k - k+1 G R 2 , valid that 



g k+ i(x k+1 ) 



dQ(x k , ■) 
dR k (x kr ) 



(x k+1 ) cx h k (x k ) 



Vk(xk) 



exp 



\{x k ) 



\x k ) 



2d 2 k {x k ) 2al{x k ) 



(5.2) 



(5.3) 



(5.4) 



and since the right hand side does not depend on x k+ \ we can, by letting t k {xo :k ) — 
h k (x k ), x 0: fc G R fe+1 , obtain second-stage weights being indeed unity (providing a sample 
of genuinely (^^-distributed particles). When this is achieved, Pitt and Shcphard (1999a) 
call the particle filter fully adapted. There is however nothing in the previous theoretical 
analysis that supports the idea that aiming at evenly distributed second-stage weights is 
always convenient, and this will also be illustrated in the simulations below. On the other 
hand, it is possible to find cases when the fully adapted particle filter is very close to being 
optimal; see again the following discussion. 

In this part we will study the following two special cases of (5.1): 
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• m(X k ) = (j)X k and a w (X k ) = o 

For a linear/ Gaussian model of this kind, exact expressions of the optimal weights can 
be obtained using the Kalman filter. We set <j> = 0.9 and let the latent chain be put 
at stationarity from the beginning, that is, A <~ 7V[0, er 2 /(l — (f> 2 )]- In this setting, 
we simulated, for a = a v = 0.1, a record j/o:io of observations and estimated the 
filter posterior means (corresponding to projection target functions n k (xo :k ) = x k , 
x a . k G M fc+1 ) along this trajectory by applying (1) SSAPF based on true optimal 
weights, (2) SSAPF based on the generic weig hts of Pitt and Shephard (1999a), 
and (3) the standard bootstrap particle filter (that is, SSAPF with t k = 1)- In this 
first experiment, the prior kernel Q was taken as proposal in all cases, and since 
the optimal weights are derived using asymptotic arguments we used as many as 
100,000 particles for all algorithms. The result is displayed in Figure 1(a), and it is 
clear that operating with true optimal allocation weights improves — as expected — the 
MSE performance in comparison with the other methods. 

The main motivation of Pitt and Shephard (1999a) for introducing auxiliary parti- 
cle filtering was to robustify the particle approximation to outliers. Thus, we mimic 
Cappe et al. (2005, Example 7.2.3) and repeat the experiment above for the obser- 
vation record y :5 = (-0.652,-0.345,-0.676,1.142,0.721,20), standard deviations 
a v = 1, a = 0.1, and the smaller particle sample size N — 10,000. Note the large 
discrepancy of the last observation ?/5, which in this case is located at a distance of 
20 standard deviations from the mean of the stationary distribution. The outcome is 
plotted in Figure 1(b) from which it is evident that the particle filter based on the 
optimal weights is the most efficient also in this case; moreover, the performance of 
the standard auxiliary particle filter is improved in comparison with the bootstrap 
filter. Figure 2 displays a plot of the weight functions t\ and t^ s for the same ob- 
servation record. It is clear that t^ s is not too far away from the optimal weight 
function (which is close to symmetric in this extreme situation) in this case, even 
if the distance between the functions as measured with the supremum norm is still 
significant. 

Finally, we implement the fully adapted filter (with proposal kernels and first stage- 
weights given by (5.2) and (5.4), respectively) and compare this with the SSAPF based 
on the same proposal (5.4) and optimal first-stage weights, the latter being given by, 
for x 0:k E R k+1 and h k defined in (5.4), 



in this case. We note that h k , that is, the first-stage weight function for the fully 
adapted filter, enters as a factor in the optimal weight function (5.5). Moreover, 
recall the definitions (5.3) of rh k and a k ; in the case of very informative observations, 
corresponding to a v <C er, it holds that a k (x) m a v and rh k (x) « Uk+i with good 
precision for moderate values of x € R (that is, values not too far away from the mean 
of the stationary distribution of X). Thus, the factor beside h k in (5.5) is more or less 
constant in this case, implying that the fully adapted and optimal first-stage weight 
filters are close to equivalent. This observation is perfectly confirmed in Figure 3(a) 




(5.5) 
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(a) (b) 

Fig. 1. Plot of MSE perfomances (on log-scale) of the bootstrap particle filter (*), the SSAPF based 
on optimal weights (□), and the SSAPF based on the generic weights of Pitt and Shephard 
(1 999a) (o). The MSE values are founded on 1 00,000 particles and 400 runs of each algorithm. 




14 16 18 20 22 24 26 28 

particle position 



Fig. 2. Plot of the first-stage importance weight functions t\ (unbroken line) and (clashed line) 
in the presence of an outlier. 
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Fig. 3. Plot of MSE perfomances (on log-scale) of the bootstrap particle filter (*), the SSAPF based 
on optimal weights (□), the SSAPF based on the generic weights ££ &s (o), and the fully adapted 
SSAPF (x) for the Linear/Gaussian model in Section 5.1. The MSE values are computed using 
10,000 particles and 400 runs of each algorithm. 



which presents MSE performances for a v = 0.1, u = 1, and N = 10,000. In the 
same figure, the bootstrap filter and the standard auxiliary filter based on generic 
weights are included for a comparison, and these (particularly the latter) are marred 
with significantly larger Monte Carlo errors. On the contrary, in the case of non- 
informative observations, that is, u v » u, we note that &k{x) ~ f, fhk{x) w <px 
and conclude that the optimal kernel is close the prior kernel Q. In addition, the 
exponent of hk vanishes, implying uniform first-stage weights for the fully adapted 
particle filter. Thus, the fully adapted filter will be close to the bootstrap filter in this 
case, and Figure 3(b) seems to confirm this remark. Moreover, the optimal first-stage 
weight filter does clearly better than the others in terms of MSE performance. 

• m(X k ) = and a w (X k ) = ^f3 + faX* 

Here we deal with the classical Gaussian autoregressive conditional heteroscedasticity 
(ARCH) model (see Bollerslev et al., 1994) observed in noise. Since the nonlinear 
state equation precludes exact computation of the filtered means, implementing the 
optimal first-stage weight SSAPF is considerably more challenging in this case. The 
problem can however be tackled by means of an introductory zero-stage simulation 
pass, based on R <C N particles, in which a crude estimate of (pk+if 1S obtained. 
For instance, this can be achieved by applying the standard bootstrap filter with 
multinomial resampling. Using this approach, we computed again MSE values for 
the bootstrap filter, the standard SSAPF based on generic weights, the fully adapted 
SSAPF, and the (approximate) optimal first-stage weight SSAPF, the latter using 
the optimal proposal kernel. Each algorithm used 10,000 particles and the number of 
particles in the prefatory pass was set to R = N/10 = 1000, implying only a minor 
additional computational work. An imitation of the true filter means was obtained 
by running the bootstrap filter with as many as 500,000 particles. In compliance 
with the foregoing, we considered the case of informative (Figure 4(a)) as well as 
non-informative (Figure 4(b)) observations, corresponding to ((3q, a v ) = (9,5,1) 




Fig. 4. Plot of MSE perfomances (on log-scale) of the bootstrap particle filter (*), the SSAPF based 
on optimal weights (□), the SSAPF based on the generic weights tl kS (o), and the fully adapted 
SSAPF (x) for the ARCH model in Section 5.1 . The MSE values are computed using 10,000 particles 
and 400 runs of each algorithm. 



and (po, Pi,a v ) = (0.1,1,3), respectively. Since a k (x) w a v , rhk(x) w J/fe+i in the 
latter case, we should, in accordance with the previous discussion, again expect the 
fully adapted filter to be close to that based on optimal first-stage weights. This is 
also confirmed in the plot. For the former parameter set, the fully adapted SSAPF 
exhibits a MSE performance close to that of the bootstrap filter, while the optimal 
first-stage weight SSAPF is clearly superior. 



5.2. Stochastic volatility 

As a final example we consider the canonical discrete-time stochastic volatility (SV) model 
(Hull and White, 1987) given by 

Xk+i = <f>X k + aWk+i , 
Y k =f3cxp(X k /2)V k , 

where X = R, and {TTfclJ*^ and {Vfc}^ are as in Example 5.1. Here X and Y are 
log- volatility and log-returns, respectively, where the former are assumed to be stationary. 
Also this model was treated by Pitt and Shephard (1999a), who discussed approximate 
full adaptation of the particle filter by means of a second order Taylor approximation of 
the concave function x' i— » log g k +i(x'). More specifically, by multiplying the approxi- 
mate observation density obtained in this way with q(x, x'), (x, x') G M 2 , yielding a Gaus- 
sian approximation of the optimal kernel density, nearly even second-stage weights can 
be obtained. We proceed in the same spirit, approximating however directly the (log- 
concave) function x' ^ g k +i(x')q(x, x') by means of a second order Taylor expansion of 
x' i > \og[g k+ i(x')q(x,x')] around the mode m k {x) (obtained using Newton iterations) of 
the same: 

g k+1 (x')q(x, x') w r k (x, x') = g k+1 [fh k (x)]q[x, fh k (x)} exp j- 3, [x' - fh k {x)} 2 

I z<J k \ x ) 



1 6 Douc et al. 



with (we refer to Cappe et al., 2005, pp. 225-228, for details) being the inverted 

negative of the second order derivative, evaluated at rhk(x), of x' log[gk+i(x')q(x, x')]. 
Thus, by letting, for (x,x') e R 2 , r k (x,x') = r k (x,x')/ J R r k (x, x") Ax" , we obtain 

gk+iixk+i) ^^ 1 \ ( x k+i) ~ / ^(x fe ,x')dx' oc CT fe (x fe )g fc+ i[m fe (a; fc )]g[a;,m fe (x fc )] , (5.6) 
dR k {x k ,-) J R 

and letting, for £ :fc G R k+1 , t k (x 0:k ) = a k (x k )g k+ i[m k (x k )]q[x k ,fh k (x k )] will imply a nearly 
fully adapted particle filter. Moreover, by applying the approximate relation (5.6) to the 
expression (4.1) of the optimal weights, we get (cf. (5.5)) 



3C 



t*k[^k+i]{xo:k) ~ / r k (x k , x')dx\l l ^l +1 [n k+ i](x) R k (x k ,dx) 
Jr y JM 

v k (x k )g k+1 [m k (x k )}q[x,m k (x k )})Jal(x k ) + m 2 k (x k ) - 2m k (x k )4> k+1 ir k+1 + c/)l +1 ir k+1 . 

(5.7) 

In this setting, we conducted a numerical experiment where the two filters above were, 
again together with the bootstrap filter and the auxiliary filter based on the generic weights 
^ &s , run for the parameters {<j>,/3,a) = (0.9702,0.5992,0.178) (estimated by Pitt and 
Shephard, 1999b, from daily returns on the U. S. dollar against the U. K. pound stearling 
from the first day of trading in 1997 and for the next 200 days). To make the filtering 
problem more challenging, we used a simulated record yo-.io of observations arising from 
the initial state xq = 2.19, being above the 2% quantile of the stationary distribution of 
X, implying a sequence of relatively impetuously fluctuating log-returns. The number of 
particles was set to N = 5,000 for all filters, and the number of particles used in the prefatory 
filtering pass (in which a rough approximation of (pk+i^k+i m (5-7) was computed using 
the bootstrap filter) of the SSAPF filter based on optimal first-stage weights was set to 
R = N/5 = 1000; thus, running the optimal first-stage weight filter is only marginally more 
demanding than running the fully adapted filter. The outcome is displayed in Figure 5. It 
is once more obvious that introducing approximate optimal first-stage weights significantly 
improves the performance also for the the SV model, which is recognised as being specially 
demanding as regards state estimation. 



A. Proofs 

A.1. Proof of Theorem 3.1 

Let us recall the updating scheme described in Algorithm 1 and formulate it in the following 
four isolated steps: 

j/ t JV,t t\\N I: Weighting r /c N,i N,is X N II: Resampling (1st stage) r,ZN,i , U M N 
US0:fc ! L )Si=l > U?0:fe' T *: )h=l > US0:fc ' L )k=l ~~ * 

III: Mutation r,ZN,i ~N.i\-,M N IV: Resampling (2nd stage) s(c N,i -i\\N (\ i\ 

US0:fc+l' ^k+llh^ * 1AS0:fc+l> 1 -)ii=l > \ A - 1 ) 

where we have set £ .£ = £ .j, k , 1 < i < Mjv- Now, the asymptotic properties stated 
in Theorem 3.1 are established by a chain of applications of (Douc and Moulines, 2005, 
Theorems 1-4). We will proceed by induction: assume that the uniformly weighted par- 
ticle sample {{£, . k A)}iLi is consistent for [L 1 (X fe+1 , <f) k ), cj) k ] and asymptotically normal 
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Fig. 5. Plot of MSE perfomances (on log-scale) of the bootstrap particle filter (*), the SSAPF based 
on optimal weights (□), the SSAPF based on the generic weights ££ &s (o), and the fully adapted 
SSAPF (x) for the SV model in Section 5.2. The MSE values are computed using 5,000 particles 
and 400 runs of each algorithm. 



for [4>k, A k , L 1 (X fc+1 , (f>k), Ok, 4>k, {V^V}jv=iL with A k being a proper set and o\~ such that 
crfc(a/) = \a\a k (f), / G Afc, oel. We prove, by analysing each of the steps (I— IV), that 
this property is preserved through one iteration of the algorithm. 
(I). Define the measure 

4>ktk 

By applying (Douc and Moulines, 2005, Theorem 1) for R(xQ. k ,-) — S Xo . k (-), L{xQ :k ,-) = 
tk(xo-.k) S Xo . k (-), M = Mfe, and v = <p k , we conclude that the weighted sample , T^' l )}fL 1 

is consistent for [{/ G L 1 ^ 1 ,/*) : t k \f\ G \} {X k +\ <j, k )} , Hk] = [L 1 {X k + 1 , t i k ), t i k ]. Here 
the equality is based on the fact that <f) k (t k \f\) = Hk\f \ <fikt k , where the second factor on 
the right hand side is bounded by Assumption (Al). In addition, by applying (Douc and 
Moulines, 2005, Theorem 1) we conclude that {(^Q. k , T^' l )}f =1 is asymptotically normal for 
(/ifc, Ai ife , Wi ;fc , t7i, fe , 7i, fc , {\/iV}^ =1 ), where 

Ai, fc = {/ G L 1 ^ 1 , » k ) : t k \f\ G A k ,t k f G L 2 (X fc+1 , &)} 

= j/ G l}(X k+1 , (i k ) : t k f G Afe n L 2 (X fe+1 , j , 

Wi, fc 4 {/ g L 1 ^ 1 ,^) : t 2 k \f\ G L 1 ^ 1 ,^)} 

are proper sets, and 

t k (f - Hkf) 



*l k (f)^4 



(</)kt k ) 2 



n f A (t>k{tjf) 



(II) . By using (Douc and Moulines, 2005, Theorems 3 and 4) we deduce that {(io- k , 
is consistent for [L 1 (X fe+1 , ^ fe ), ^ fe ] and a.n. for [fj, k , A u ,k, L 1 (X fe+1 , fj, k ), a u ,k, 0f*k, {Vn}n=i\, 
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where 



A„, fe 4 {/ e A Iifc : / G L 2 (X fc+1 , Mfe )} = {/ G L 2 (X fc+1 ,/i fc ) : t k f G A fc n L 2 (X fc+1 , fc )} 
is a proper set, and 



(III). Wc argue as in step (I), but this time for v = /j, k , R = R%, and L(-,A) = 
Rl(-,w k+1 lA), A £ X®( k + 2 ) t providing the target distribution 

M(A) = nRl^U) = ^W) = {A) A £ ^ (fc+2) (A 2) 
/j, k R k w k +i (pk-Hkv^ ) 

This yields, applying (Douc and Moulines, 2005, Theorems 1 and 2), that { , ^fc+i)}j=i 
is consistent for 

[{/ € L 1 (X fe+2 ,0fe + i),i?^(-,Wfe + i|/|) G L 1 ^ 1 , //*)}, <^+i' 

= [L 1 ^ 2 , , (A.3) 

where (A.3) follows, since ^ k R v k {w k+ i 1/|) </> fc i fc = <p k H^(X k+2 ) <j> k+1 \f\, from (Al), and a.n. 
for (0k+i,Aiii ife+ i, Win ife+ i,c7iii )fc+ i,7iii )fc+ i,{\/]V}^ =1 ). Here 

AlILfc+l 

A | ; e L i(X fe + 2 ,^ +1 ) : Rl(;w k+1 \f\) G A„ ifc) i$(.,«£ +1 / 2 ) G L 1 ^ 1 ,/*)} 
= {/eL'fX^,^) : flP(., Wfe+1 |/|) G L 2 (X fe +\ Mfc ), 

t fc i$(-,u>fc+il/l) G A fc n L 2 (X k+1 A k ),R p k (;w 2 k+1 f 2 ) G L 1 ^ 1 ,^)} 
= j/ G L 1 (X fe+2 , 0fe + i) : ^(-^fe+rl/D^O,!/!) G L 1 ^ 1 ,^), 

ff fc u (-, |/|) G A fe n L 2 (X fc+1 ,0 fe ), Wfe+1 / 2 G L 1 ^ 2 ,^^)} 



and 

W 



m,fc + i = {/ G ^(X^ 2 ,^) : i?P(-,^ +1 |/|) G L^X^ 1 ,^)} 
= {/ G L^X^.^+i) : w k+1 .f G L^X^ 2 ,^)} 

are proper sets. In addition, from the identity (A. 2) we obtain that 



fj, k Rl(wk+i®k+i[f]) = , 



where $fc+i is defined in (3.1), yielding 
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CT in,fc+i(/) 



n ' fe \ Li k Rlw k+1 J (/i fe i?P W7 fe+ i) 2 

_ /3 Mfc ({^( Wfc+1 $ fc+1 [/])} 2 ) | qg{t fc i?P(-, Wfc+1 $ fc+1 [/])} 
(fikRlwk+i) 2 {4> k t k ) 2 {^ k R\w k+ i) 2 

^kR P k({wk + i^>k + i[f}-R P k(-,w k+1 ^ k+1 [f})} 2 ) 
[iJLkR k Wk+iY 

Now, applying the equality 

{Rl(;W k+ ^ k+1 [f})} 2 + R P k (;{w k+ l-fk+llf}- i?P(-, Wfe+1 $ fe+1 [/])} 2 ) 

= J RP(., W 2 +1 0> 2 +1 [/]), 

provides the variance 

^ii,fc+iu; - [0 fe ^(x fe + 2 )] 2 ' ; e m ' fe+1 ' 

(A.4) 

Finally for / G \Nm tk+1 , 

f a PVkRl{w 2 k+1 f) _ I3(j) k+1 (w k+1 f) <j) k t k 
7lII ' fe+l/ ( Pk Rlw k+1 ) 2 fc AJ?(X*+2) • 

(IV). The consistency for [L 1 (X fe+2 , </>fc+i), <^fc+i] of the uniformly weighted particle sam- 
P^ e {(^o-fc+i' l)ifc=i follows from (Douc and Moulines, 2005, Theorem 3). In addition, ap- 
plying (Douc and Moulines, 2005, Theorem 4) yields that the same sample is a.n. for 
[<f>k+i, Aiv,fc+i, L 1 (X' c+2 ,0 fe+ i),criv,fe+i,0fc+i,{V r /V}w=i] ; witn 

Aiv,fe+i = {/ G Ain, fc+1 : / G L 2 (X fc + 2 , fa+i)} 

= {f GL 2 (X k+2 ,^ k+1 ) : Rl(.,w k+1 \f\)H^;\f\) G L 1 ^ 1 ,^), 
l^G, |/|) G A fc n L 2 (X fe+1 , fc ), Wfc+1 / 2 G L 1 ^ 2 , ^ fc+1 )} 

being proper set, and, for / G Aiv,fe+i, 

^V.fc+lC/) = 0fc+l*fc+l[/] +OlII,fc+l(/) - 

with CTjjj k+i(f) being defined by (A.4). This concludes the proof of the theorem. 



R p k (.,w k+1 $ k+1 [f})\ , /3 Mfe i?P({ U ; fc+1 $ fe+1 [/]-i?P(-,^ +1 <i> fc+1 [/])} 2 ) 
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A.2. Proof of Corollary 3.1 

We pick / <G L 2 (X fc+2 , <p k +i) an d prove that the constraints of the set A^+i defined in (3.2) 
are satisfied under Assumption (A3). Firstly, by Jensen's inequality, 

MRl(-^ k+ i\f\)H^;\f\)} 

= Mtk[R P k (;w k+1 \f\)} 2 } 
<MtkR P k (;wt +1 f)} 

< \\w k+1 \\ xk+2 oo <b k H%(X k + 2 )<b k+1 (f 2 ) < oo , 

and similarly, 

M[m-M\)\ 2 } < ll^+illx^oo ^ fc ^(x fc + 2 ) ^ fc+1 (/ 2 ) < 00 . 

From this, together with the bound 

<Pk+l(w k+1 .f 2 ) < ||Wfc+l|| X fc + 2,oc ^fc+l(/ 2 ) < OO , 

we conclude that A fc+ i = L 2 (X fc+2 , ^fe+i). 

To prove L 2 (X fe+1 , <j> k ) C A k , note that assumption (A3) implies W fc = L 1 (X fc+1 , <j> k ) 
and repeat the arguments above. 



A3. Proof of Theorem 3. 3 

Define, for r € {1, 2} and Rn{t) as defined in Theorem 3.3, the particle measures 

1=1 ' "fc i=l 

playing the role of approximations of the smoothing distribution cf> k . Let Tq = <j(^q' 1 ; 1 < 
i < N); then the particle history up to the different steps of loop m + 1, m > 0, of 
Algorithm r, r e {1,2}, is modeled by the nitrations T m = T m V cr[I^ 1 ; 1 < i < i?jv(r)]> 
^m+i = V cr[|(^ +1 ; 1 < i < i?w( r )L and 



^ ^ i ^"m+i Vct(J^ 1 ;1 < i < N) , forr = l, 
•Fm+i , for r = 2 . 



m+l 



respectively. In the coming proof we will describe one iteration of the APF algorithm by 
the following two operations. 



r/ t JV,i JV,ix-,JV Sampling from ip k+1 - N <z -iv^xirtjv 
X^O-.k^k )h=l , il?0:fc+l ,W *+l/J't=l 



fc+l7Ji=l 
r — 1: Sampling from (j>^. k+1 



where, for A e #®( fe + 2 ), 
4W = '(^i^ 



JV 

E 



, ,N,j N,j 
u k r fc 



(A.5) 
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for some index i € {1, . . . , i?Ar(r)} (given T k , the particles 1 < i < Rn{t), arc i.i.d.). 



Here the initial weights {uJ k yt }f =1 are all equal to one for r = 1. The second operation is 
valid since, for any iq S {1, . . . , TV}, 

fiw(r) ~ N,j 

j= i "fe+i 

The fact that the evolution of the particles can be described by two Monte Carlo opera- 
tions involving conditionally i.i.d. variables makes it possible to analyse the error using the 
Marcinkiewicz-Zygmund inequality (see Petrov, 1995, p. 62). 
Using this, set, for 1 < k < n, 

r r]n N 

c$(A) 4 / ^-(x 0:fc ) ^(dx 0:k ) , A e *®(*+» , (A.6) 

with, for x Q . k e X fe+1 , 

da£ A w k (x .. k )HZ---HZ_ 1 (x 0:k ,X n + 1 )<l>Z_ 1 t k - 1 

^ N k ^_ 1 ^_ 1 ---^_ 1 (X«+i) 

Here we apply the standard convention if" • • • = Id if m < £ For k = we define 

ao (A)4 /* ^(a;o)?(da;o), A G X , 

J A d< ? 



with, for x G X, 



da , \ a w (x Q )HX ■ ■ ■ H-_ 1 (x ,X n + 1 ) 



v uy % tf u ---^-i(-,X" +1 )] • 

Similarly, put, for < k < n — 1, 

ft (A) = I %M ft (too*) , A e Af^+D , (A.7) 



where, for x 0:k € X fe+1 , 



d/3f, ^•■•^ 1 ( %t ,X«+ 1 ) 



#f «---^-i(X" +1 ) 

The following powerful decomposition is an adaption of a similar one derived by Ols- 
son et al. (2005, Lemma 7.2) (the standard SISR case), being in turn a refinement of a 
decomposition originally presented by Del Moral (2004). 

Lemma A.l. Let n > 0. Then, for all f e B h (X n+1 ), N > 1, and r e {1,2}, 

n 7i— 1 

4>Lf ~ <t>nf = E <(/) + Hr = 1} E + . ( A - 8 ) 

fe=l fc=0 
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where 

ES'"Sv(gm,.[/](g) 

= "V ^-W. 

s~iN ( f\ A Z^i=l -^-(^0 ' )^0:n[/](^0 ' ) , , r,, 
C \t)- ^JV d&ftNss <PnV0:nlt\ , 

and the operators ^k-.n '■ #b(X™ +1 ) — > ,Sb(X™ +1 ) 7 < fc < are, /or some /ked points 
io-.k € X fc+1 7 defined by 

lT , r,, H k' ' ' H n-lf( x 0:k) ■ ■ ■ H^f (x :k) 

^k:n[j\ ■ X0:k ' 



^•••^-l(^0: fc ,X«+l) ff u... jff u_ i(£o:fe;X „+l ) • 



Proof of Lemma A.l. Consider the decomposition 



fe=l L^fc 
+ l{r = l}^ 



Hn-lf 



'^-i(X 



'fe-i-^fc-i 



W fe ■■■ H n-lJ 



^•••^-i(X" +1 ) 



ia ; 



V ITU ITU -/* 

^•••^_ 1 (X-+i) 



+ - 



iV ITU 17 11 £ 

H ■■ ■ Un-lJ 



N TTU 

^0 ' 



•H-_ 1 (X»+i) 



in/ 



We will show that the three parts of this decomposition are identical with the three parts 
of (A. 8). For k > 1 it holds that, using the definitions (A. 5) and (A. 6) of tp£ and a£ , 
respectively, and following the lines of Olsson et al. (2005, Lemma 7.2), 



LN TJU ITU 7TU £ 



4>k-i H k-i 



* N k 



^-i(X" +1 ) 

^(■^■••^-lffl(tf-ltt-l) 

^-i^-i ■■■-ffS-i(x n+1 ) 

Wfe (-)ff fe u ---^_ 1 (-,X"+ 1 )(^_ 1 i fe _ 1 ) 



P k-l n k-l 



*fc:n [/](■) + 



•^-i(X" +1 ) 
H^--H-_J(x Q ..k) 



= a»*k:n[f} + 



H%---HZ_ 1 (x ..k,X n + 1 ) 
H- ■ ■ ■ H-_ 1 (x :k^ n+1 ) ' 



**:«[/](•) 



H^Jixo-.k) 



■ H^xo-.k^) 
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Moreover, by definition, 

^•••i^_ 1 (X"+i) 
yielding 



£f=l 



+ 



2^.7=1 



^■■■S5_l(X0:fc,X»+ 1 ) ' 



t>k-i H k-i 



Hn-lf 



A N Af) ■ 



^H- ■ ■ ■ H^iXn+i) ^_ 1 ^ fe u _ 1 • • • ^-i(X" +1 ) 
Similarly, for r = 1, using the definition (A. 7) of 



n 0:k n k-l 



■H-_ 1 (X«+i) 



N 



= /?f* fe: „[/] + 



^•••^ U -l(^0: fc ,X«+l) 

ff£---ff£-i/(&Q:0 

^■■■fl2_i(io: fc ,x«+i) ' 



and applying the obvious relation 



we obtain the identity 



^■■■flS_l(X0:fc,X»+l) 



iV ITU 1/ 11 J? 

fc W fe ' ■■ H n-lJ 



The equality 



<$HZ---HZ_ 1 f 
^H^...H-_ 1 (Xn+^) 



follows analogously. This completes the proof of the lemma. 



□ 



Proof of Theorem 3.3. From here the proof is a straightforward extension of (Olsson et 
al, 2005, Proposition 7.1). To establish part (i), observe that: 

• A trivial adaption of (Olsson et at, 2005, Lemmas 7.3 and 7.4) gives that 



ll*fc=n[/i]|| 



XM-i oo < 0SC 



(fi)p° Vii 



-k) 



da? 



< 



\ w k\\x*+\oo \fa-lW 



X fe ,c 



X*+ 1 ,oo 



(A.9) 
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• By mimicking the proof of (Olsson et al, 2005, Proposition 7.1(i)), that is, applying 
the identity a/b — c = (a/b)(l — b) + a — c to each (/j) and using twice the 
Marcinkiewicz-Zygmund inequality together with the bounds (A. 9), we obtain the 
bound 



osc(/Q||w fc || xfc+1;00 ||^-iH xfcoo ov(j _ fe) 
Hg k (l-p)e- 



where B p is a constant depending on p only. We refer to (Olsson et al, 2005, Propo- 
sition 7.1) for details. 

• For r = 1, inspecting the proof of (Olsson et al, 2005, Lemma 7.4) yields immediately 



N 



d/3£ 



< 



i 



X <= + i 



1-p 



and repeating the arguments of the previous item for 5^ (/j) yields 

osc(/0^ 0v(4 _ fc) 



^V||sf(/,)|| p <B P ^P 0V(4 
• The arguments above apply directly to C N (fi), providing 



We conclude the proof of (i) by summing up. 

The proof of (ii) (which mimics the proof of (Olsson et al, 2005, Proposition 7.1(h))) 
follows analogous lines; indeed, repeating the arguments of (i) above for the decomposition 
a/b — c = (a/b)(l — b) 2 + (a — c)(l — b) + c(l — b) + a — c gives us the bounds 



R N (r) \E [A% (fi)] | < B 



, OSc(/ t ) H^fcllxfc+i.oo ll*fc-l|lx*,oo ny(i-fc) 



( M5fc )2(l-p)2 e 2 



N\E[B?(f t )]\<B0± P ^-V 
osc(/i) ||tuo||x j00 t 



N |E [C w (/,)] | < B 



(^o) 2 (l - P? 



P ■ 



We again refer to (Olsson et al, 2005, Proposition 7.1(h)) for details, and summing up 
concludes the proof. □ 



A.4. Proof of Theorem 4. 1 

The statement is a direct implication of Holder's inequality. Indeed, let tk be any hrst-stage 
importance weight function and write 

(Atl[f}) 2 = {Mtl /2 h 1/2 tt[f})} 2 (A1Q) 
<<f> k t k Mtk\tl[f}) 2 }- 
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Now the result follows by the formula (3.3), the identity 

Mt- k \tl[f]f} = MtkR p k (;wt +1 ^ k+1 [f])} , 
and the fact that we have equality in (A. 10) for t k = t* k [f\. 

Acknowledgment. The authors arc grateful to Olivier Cappe who provided sensible com- 
ments on our results that improved the presentation of the paper. 
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